Searching for a Measure of Word Order Freedom

نویسندگان

  • Vladislav Kubon
  • Markéta Lopatková
  • Tomás Hercig
چکیده

This paper compares various means of measuring of word order freedom applied to data from syntactically annotated corpora for 23 languages. The corpora are part of the HamleDT project, the word order statistics are relative frequencies of all word order combinations of subject, predicate and object both in main and subordinated clauses. The measures include Euclidean distance, max-min distance, entropy and cosine similarity. The differences among the measures are discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Formalization of Word Order Properties

This paper contains an attempt to formalize the degree of word order freedom for natural languages. It exploits the mechanism of the analysis by reduction and defines a measure based on a number of shifts performed in the course of the analysis. This measure helps to understand the difference between the word order complexity (how difficult it is to parse sentences with more complex word order)...

متن کامل

Examining the Relationship between Preordering and Word Order Freedom in Machine Translation

We study the relationship between word order freedom and preordering in statistical machine translation. To assess word order freedom, we first introduce a novel entropy measure which quantifies how difficult it is to predict word order given a source sentence and its syntactic analysis. We then address preordering for two target languages at the far ends of the word order freedom spectrum, Ger...

متن کامل

Quantifying Word Order Freedom in Dependency Corpora

Using recently available dependency corpora, we present novel measures of a key quantitative property of language, word order freedom: the extent to which word order in a sentence is free to vary while conveying the same meaning. We discuss two topics. First, we discuss linguistic and statistical issues associated with our measures and with the annotation styles of available corpora. We find th...

متن کامل

Diachronic Trends in Word Order Freedom and Dependency Length in Dependency-Annotated Corpora of Latin and Ancient Greek

One easily observable aspect of language variation is the order of words. In human and machine natural language processing, it is often claimed that parsing freeorder languages is more difficult than parsing fixed-order languages. In this study on Latin and Ancient Greek, two wellknown and well-documented free-order languages, we propose syntactic correlates of word order freedom. We apply our ...

متن کامل

An Intensity Measure for Seismic Input Energy Demand of Multi-Degree-of-Freedom Systems

Nonlinear dynamic analyses are performed to compute the maximum relative input energy per unit mass for 21 multi-degree-of-freedom systems (MDOF) with preselected target fundamental periods of vibration ranging from 0.2 to 4.0 s and 6 target inter-story ductility demands of 1, 2, 3, 4, 6, 8 subjected to 40 the earthquake ground motions. The efficiency of the several intensity measures as an ind...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016